When using cloud services, you often have to worry about security, and resort to cryptographic techniques such as
digital signatures and encryption. There are many reasons for doing this.
You may be storing sensitive data (for example, people’s health reports) and
you are mandated by law to provide extra protection. You may be protecting
sensitive data regarding your business that you don’t want in the wrong
hands. Or you could simply be paranoid and not trust Microsoft or any other
cloud provider. Whatever your reason, you can take several steps to add
further levels of protection to your data.
Note: All of this has very little to do with your trust for Microsoft or
any other cloud provider. All cloud providers (and Microsoft is no
different) have multiple levels of security and several checks to ensure
that unauthorized personnel cannot access customer applications or data.
However, you often have little choice of whether you want to trust a cloud
provider. You might be mandated to add further security levels by anything
from internal IT policy to financial regulations and compliance
laws.
This chapter is slightly different from all the others in that a
majority of the discussion here is devoted to looking at security and
cryptography. Frankly, the code and techniques used in this chapter could
just as easily be used for data on a file server as for data in the cloud.
Why does it show up in a book on cloud computing, then?
The decision to include an examination of security and cryptography
resulted from two key motivations. First, this is useful to a lot of people
when they have to build applications with highly sensitive data. More
importantly, it is so difficult to get this stuff right
that any time invested in examining good security and cryptographic
techniques is well worth it.
The danger of having an insecure system is known to everyone.
This chapter shows you how to build a secure backup system for your
files. It will cover how to use the right kinds of cryptographic practices
and blob storage features to ensure some security properties.
Note: This chapter is not meant as a comprehensive introduction to
cryptography. If you’re interested in that, Practical
Cryptography by Niels Ferguson and Bruce Schneier (Wiley) is a good place to start. A quick web
search brings up a lot of good references as well.
1. Developing a Secure Backup System
In this chapter, you will learn how to build a secure system that
should satisfy even the most “paranoid” conspiracy theorist. You will
discover how a real-world application (hopefully, a useful one) will use
the blob service, as well as the challenges and trade-offs involved.
Finally, you will learn how to code to a nontrivial application.
Note: Don’t infer that the word paranoid suggests
that these techniques aren’t relevant to normal users. Mentally insert
“highly conservative from a security perspective” whenever you see the
word paranoid throughout the ensuing
discussions.
The application has a highly creative name, Azure Backup (azbackup),
and it is quite simple to use. It mimics the tar utility that ships with most modern Unix
systems. Instead of compressing and making a single backup out of multiple
files and directories to disk, azbackup
lets you compress files and make backups to Windows Azure blob storage
instead. The tool tars multiple files
and directories together into one big file (in exactly the same manner as
the Unix tar command). This tar file is then compressed using the popular
gzip algorithm.
Why tar and then compress? Why
not compress each file individually? By combining multiple files in one
large file, you gain higher compression rates. Compression algorithms
compress data by finding redundancy. They have a better chance of finding
redundancy in one large file, rather than in several small files
individually. Having one large file is also easier for you to manage when
it comes to moving around, copying, and managing any operation.
The entire code for this sample is available at http://github.com/sriramk/azbackup. You’ll be seeing
snippets of code as this chapter progresses, but you can always look at
the entire source code. It is also very easy to set up and run, and should
work on Windows as well as any modern Unix system that has Python
support.
Note: The sample takes inspiration from the excellent tarsnap
service. If you’re really paranoid and want to delve into a real,
production backup service, tarsnap’s design makes for great reading.
2. Understanding Security
A primary challenge developers face is deciding how secure to
make an application. When you ask most people how secure they want their
data or their application, you’re going to hear superlative terms such as
impenetrable, totally secure,
and so on. When you hear someone say that, you should run as quickly as
you can in the opposite direction.
Unfortunately (or fortunately, for security consultants), there is
no completely secure system. In theory, you can imagine an application
running on an isolated computer with no network connections in an
underground bunker surrounded by thick concrete, protected by a small
army. And even that can’t be completely secure.
Security, like other things, is a spectrum in which you get to pick
where you want to be. If you’re building a small social bookmarking
service, your security needs are different than if you’re working for the
government building a system for the National Security Agency
(NSA).
For the sample backup application you’ll see in this chapter, you
will be as paranoid as possible. The goal is that your data should be
secure even if three-letter government agencies wanted to get to it. This
is overkill for applications you’ll be building, so you can look at the
security techniques used in this chapter and pick which ones you want to
keep, and which ones you don’t care about.
Before you figure out how to secure something, you must know what
you are securing it from, and what secure even means. For the application in this
chapter, let’s define a few security properties that should hold at all
times.
The first property is secrecy. The data that you back up using this
application should not be in the clear either in motion or at rest. No one
should be able to get data in plain form apart from you. Importantly, this
data should not be in the clear with your cloud provider of choice here,
Microsoft.
The second property is integrity. You must instantly verify whether the
data you backed up has been tampered with in any way. As a bonus, it would
be nice if this verification were done with something bound to your
identity (also known as a digital signature).
The third property is the ability to verify your tools.
Essentially, you will be so paranoid that you don’t trust code you can’t
see in any layer charged with enforcing the previous two properties. This
means you will force yourself to stick to open source tools only. Note
that the fact that Microsoft is running some code in the data center is
irrelevant here, because the data is protected “outside” the data center,
and the blob service is used only as a very efficient byte
storage-and-transfer mechanism.
We will discuss the first two properties throughout the course of
this chapter. For the third property, you will stick with open source
tools that will work just as well on a non-Windows open source
platform.
To run the code in this chapter, you need two pieces of software.
The first is Python 2.5 or later, which you can download from http://www.python.org if you don’t already have it.
Note: Almost all modern *nix operating systems ship with some version of
Python. As of this writing, most did not ship with Python 2.5 or later.
To check the version of Python you have, run python --version at a command line.
Unfortunately, Python lacks some of the core cryptographic
functionality that is used in this chapter, so the second piece of
required software is an additional Python package called M2Crypto. You can find prebuilt versions for your operating
system of choice at http://chandlerproject.org/Projects/MeTooCrypto. This is a
popular Python library maintained by Heikki Toivonen that wraps around the OpenSSL tool set to
provide several cryptographic and security features.
Note: M2Crypto doesn’t have the greatest documentation in the world, but
since it is a thin wrapper around OpenSSL, you can often look up documentation for the
OpenSSL function of the same name.